Decision Trees and Forests: A Probabilistic Perspective

نویسنده

  • Balaji Lakshminarayanan
چکیده

Decision trees and ensembles of decision trees are very popular in machine learning and often achieve state-of-the-art performance on black-box prediction tasks. However, popular variants such as C4.5, CART, boosted trees and random forests lack a probabilistic interpretation since they usually just specify an algorithm for training a model. We take a probabilistic approach where we cast the decision tree structures and the parameters associated with the nodes of a decision tree as a probabilistic model; given labeled examples, we can train the probabilistic model using a variety of approaches (Bayesian learning, maximum likelihood, etc). The probabilistic approach allows us to encode prior assumptions about tree structures and share statistical strength between node parameters; furthermore, it offers a principled mechanism to obtain probabilistic predictions which is crucial for applications where uncertainty quantification is important. Existing work on Bayesian decision trees relies on Markov chain Monte Carlo which can be computationally slow and suffer from poor mixing. We propose a novel sequential Monte Carlo algorithm that computes a particle approximation to the posterior over trees in a top-down fashion. We also propose a novel sampler for Bayesian additive regression trees by combining the above top-down particle filtering algorithm with the Particle Gibbs (Andrieu et al., 2010) framework. Finally, we propose Mondrian forests (MFs), a computationally efficient hybrid solution that is competitive with non-probabilistic counterparts in terms of speed and accuracy, but additionally produces well-calibrated uncertainty estimates. MFs use the Mondrian process (Roy and Teh, 2009) as the randomization mechanism and hierarchically smooth the node parameters within each tree (using a hierarchical probabilistic model and approximate Bayesian updates), but combine the trees in a non-Bayesian fashion. MFs can be grown in an incremental/online fashion and remarkably, the distribution of online MFs is the same as that of batch MFs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision Forests with Oblique Decision Trees

Ensemble learning schemes have shown impressive increases in prediction accuracy over single model schemes. We introduce a new decision forest learning scheme, whose base learners are Minimum Message Length (MML) oblique decision trees. Unlike other tree inference algorithms,MMLoblique decision tree learning does not over-grow the inferred trees. The resultant trees thus tend to be shallow and ...

متن کامل

Forests of Decision Trees

In many cases it is better to extract a set of decision trees and a set of possible logical data descriptions instead of a single model. Methods for creating forests of decision trees based on Separability of Split Value (SSV) criterion are presented. Preliminary results confirm their usefulness in understanding data structures.

متن کامل

Probabilistic analysis of the asymmetric digital search trees

In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...

متن کامل

Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confiden...

متن کامل

Changes in leaf nutrients of healthy and withered trees in the forests of Garan (in Marivan) and Dezlei (in Sarvabad) in Kurdistan province

Most macronutrient accumulation occurs in tree leaves and there is a strong relationship between leaf nutrient change and tree decline. The aim of this study was to investigate the changes in leaf nutrients in healthy and withered trees in the forests of Marivan and Sarvabad county in Kurdistan province. For this purpose, Garan (Marivan) with declining trees were selected as unhealthy site, whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016